DCP-NAS: Discrepant Child-Parent Neural Architecture Search for 1-Bit CNNs
115
TABLE 4.2
Effect of with/without the reconstruction error and the
tangent direction constraint on the ImageNet data set. The
architecture used for the experiments is DCP-NAS-L.
Tangent direction (D(ˆα))
Reconstruction error (LR( ˆw, β))
Accuracy
Top-1
66.7
68.3
68.2
72.4
Top-5
83.3
85.0
85.1
89.2
used for both parent and child models. When applied to the Child model, the w here denotes
the reconstructed weights from the binarized weights, that is,, w = β ◦b ˆw.
4.4.7
Ablation Study
Effectiveness of Tangent Propagation In this section, we evaluate the effects of the
tangent propagation on the performance of DCP-NAS, the hyperparameter used in this
section includes λ, μ. Furthermore, we also discuss the effectiveness of the reconstruction
error. The implementation details are given below.
For searching for a better binary neural architecture, λ and μ are used to balance the
KL divergence ˜f( ˆw, ˆα, β) to supervise the Child, the reconstruction error for binary weights
LR( ˆw, β) and the constraint in the tangent direction D(ˆα). We evaluated λ and μ on the
ImageNet data set with the DCP-NAS-L architecture. To better understand tangent prop-
agation on the large-scale ImageNet ILSVRC12 dataset, we experimented to examine how
the tangent direction constraint affects performance. Based on the experiments described
above, we first set λ to 5e −3 and μ to 0.2 if they are used. As shown in Table 4.2, both
FIGURE 4.15
With different λ and μ, we evaluated the Top-1 accuracies of DCP-NAS-L
on ImageNet.